High-throughput image processing software for the study of nuclear architecture and gene expression

High-throughput imaging (HTI) generates complex imaging datasets from a large number of experimental perturbations. Commercial HTI software programs for image analysis workflows typically do not allow full customization and adoption of new image processing algorithms in the analysis modules. While open-source HTI analysis platforms provide individual modules in the workflow, like nuclei segmentation, spot detection, or cell tracking, they are often limited in integrating novel analysis modules or algorithms. Here, we introduce the High-Throughput Image Processing Software (HiTIPS) to expand the range and customization of existing HTI analysis capabilities. HiTIPS incorporates advanced image processing and machine learning algorithms for automated cell and nuclei segmentation, spot signal detection, nucleus tracking, nucleus registration, spot tracking, and quantification of spot signal intensity. Furthermore, HiTIPS features a graphical user interface that is open to integration of new analysis modules for existing analysis pipelines and to adding new analysis modules. To demonstrate the utility of HiTIPS, we present three examples of image analysis workflows for high-throughput DNA FISH, immunofluorescence (IF), and live-cell imaging of transcription in single cells. Altogether, we demonstrate that HiTIPS is a user-friendly, flexible, and open-source HTI software platform for a variety of cell biology applications.


Image loading and visualization
HiTIPS is designed for easy setup and execution of HTI image analysis pipelines.Users can first use HiTIPS by optimizing analysis parameters in an interactive fashion on a subset of representative images, enabled by HiTIPS' ability to provide visual feedback of the results of the image analysis overlaid on the original images.Once this optimization process is completed, users can then run the analysis in batch mode on the whole image dataset.Both the interactive analysis module setup and launching the batch analysis pipeline steps in HiTIPS do not require programming, thanks to a GUI that is used for data loading, for image and results visualization, and for the choice of the image analysis modules parameters (Fig. 1A).
HiTIPS allows users to select HTI imaging datasets and to load selected data on demand, thus eliminating the need to retain the entire dataset in memory (Fig. 1A).This enables swift, efficient access to extensive image datasets, while minimizing memory requirements for processing.As an example, a 4-channel image from an HTI dataset can be typically loaded in less than 1.2 s on different hardware platforms (Supplementary Note 1).In addition, HiTIPS uses either a generic Bio-Formats reader 20 , which allows the loading and conversion of more than 120 different imaging formats, or it uses image acquisition metadata (well position, field of view (FOV), channel, etc.) automatically generated by the microscope and saved in separated XML files.While this second mechanism is currently only implemented for the CellVoyager format and for imaging datasets generated by Micro-Manager, a popular open-source microscope controlling and image acquisition software 21 , the open-source and modular nature of HiTIPS allows the future extension of metadata reading from files to other instruments and formats, potentially including the recently developed OME-ZARR format 22 .Thanks to the use of image acquisition metadata by HiTIPS, users can rapidly select specific wells, FOVs, and/or channels to quickly load single merged FOVs in the viewer for visual inspection of the images, and for optimization of the image analysis parameters (Fig. 1A).
Once the images are loaded, users can perform a series of routine changes to their visualization, including toggling specific channels on or off, showing a z-projected version of the image if the FOV is present as a 3D z-stack, and independently adjusting minimum and maximum intensity levels for each of the channels (Fig. 1B).Visual inspection of random wells and FOVs in the dataset is often an essential quality control step before setting up an HTI image analysis pipeline, and it is greatly facilitated by rapid loading and rendering of the images by HiTIPS.Furthermore, the image visualization interface is not limited to the original images, but also includes the overlaid presentation of object masks and borders generated by different image analysis modules that can also be selected and whose parameters can be modified using another window in the GUI (Fig. 1C).This is an essential feature that enables rapid cycles of parameter optimization during the interactive image analysis setup phase.Finally, after configuring the analysis parameters in interactive mode, HiTIPS allows users to choose the number of parallel processing threads for batch analysis depending on the technical specification of the hardware on which the application is running, either locally, or on an HPC cluster (See Methods and Supplementary Note 1 for details on hardware configurations).

HTI image analysis workflow
While HiTIPS was built to analyze a variety of HTI assays, and it can further be extended or customized by a developer to accommodate additional specific analysis needs, we used genome architecture and gene expression assays in both fixed and live cells as model systems for our initial efforts in the development of HiTIPS.For this reason, the HiTIPS analysis workflow currently includes sequential use of image and metadata loading (Fig. 2, i), nuclei segmentation (Fig. 2, ii), fluorescent spot finding (Fig. 2, iii), nuclei tracking (Fig. 2, iv), nuclei and spot patch generation (Fig. 2, v), nuclei and spot patch registration (Fig. 2, vi), spot assignment to a track (Fig. 2, vii), measurement of fluorescence intensity (Fig. 2, viii), and 2-state Hidden Markov Model (HMM) fitting to segment fluorescence intensity traces (Fig. 2, ix).While the modular structure of HiTIPS allows to adopt already existing state-of-the art algorithms and models, such as for nuclear segmentation, tracking, and integrated spot fluorescence calculations, several of the algorithms used in these pipelines, including spot finding, nucleus registration, and spot tracking are novel and optimized for live cell imaging of gene expression (See Methods and Supplementary Note 2).HiTIPS also allows the selection of the workflow steps only up to the spot finding module (Fig. 2, i-ii), or up to the nucleus tracking module for live-cell HTI assays that do not require spot level measurements (Fig. 2, i, ii, and iv).This selection can be performed by toggling specific modules on or off in the GUI (Fig. 1A) during the interactive setup phase of image analysis workflow.Additional usage instructions on how to use HiTIPS, and detailed documentation for all the analysis modules and algorithms implemented, can be found at the online documentation page for HiTIPS, and in Supplementary Note 2. To show the utility of HiTIPS across a broad spectrum of data types and applications related to the biology of the cell nucleus, we applied it to three distinct HTI assays: (1) measurement of 3D physical distances between two genomic loci visualized with DNA FISH probes, (2) estimation of spatial clustering in the nucleus of centromeres labeled by IF with an antibody to the centromeric protein CENPC, and (3) a set of high-throughput measurements of transcriptional activity in live cells of the endogenous KPNB1 or ERRFI1 genes labeled with the fluorescent MS2/MCP-GFP system.

HiTIPS measures spatial distances between genomic loci in high-throughput fashion
Mammalian genomes are spatially organized in the cell nucleus at several different hierarchical levels, and 3D genome organization is tightly correlated with many nuclear functions such as transcription, replication, and DNA damage repair 23 .A prominent feature of genome organization are Topologically Associating Domains (TADs), which represent genomic regions which exhibit an enhanced propensity for mutual interaction, across relatively large genomic distances (200 kb-1 Mb) 23,24 .At the molecular level, one of the key factors for TAD establishment and maintenance is the cohesin complex, as demonstrated by the observation that acute depletion of the RAD21 cohesin subunit leads to loss of these domains as measured by biochemical chromatin conformation capture techniques 25 , and to an increase in physical distances between adjacent TADs as measured by DNA FISH imaging 26 .
We wanted to test HiTIPS in a high-throughput DNA FISH assay to see whether we could measure changes in the physical distances of the boundaries of the TAD containing the human EGFR gene on Chr 7 upon acute depletion of RAD21 (Fig. 3A).To this end, we performed high-throughput DNA FISH imaging experiments in HCT116-RAD21-AID cells, where RAD21 can be rapidly degraded by the cellular ubiquitin/proteasome machinery upon binding of the AID degron domain to Auxin as previously described 27 (Fig. 3A).We used automated confocal imaging to acquire z-stack images of HCT116-RAD21-AID cells treated for 3 h with Auxin or mock treated cells in 3 channels (DAPI, EGFR TAD 5' boundary/Probe A, EGFR 3' boundary/Probe B, Fig. 3B).3D image stacks were analyzed with HiTIPS in batch by segmenting nuclei using the DAPI image, by finding the position of FISH spots in 3D, and by calculating minimum distances between FISH spot centers in the two different channels on a per allele basis (Fig. 2, i-iii).After plotting the distributions of minimum distances between the genomic loci at the base of the loop domain in 1874 cells in either Auxin-treated or mock-treated control cells, we observed that RAD21 degradation upon Auxin treatment led to a statistically significant increase in the distance between TAD boundaries (Fig. 3C , p < 2e−16, Wilcoxon Test).These results show that HiTIPS can be used for the automated analysis of 3D distances measured from DNA FISH images.

Clustering analysis of centromeres in the nucleus
Centromeres are specialized genomic regions that assemble the kinetochore, a large protein complex consisting of several components, including the evolutionarily conserved CENPC protein 28 .Kinetochores physically connect chromosomes to microtubules and ensure high-fidelity genome segregation during cell division 29 .Centromere positions within the 3D space of the cell nucleus vary across species 30 .Recently, it was shown that loss of the  www.nature.com/scientificreports/condensin II complex subunit NCAPH2 leads to centromere clustering in human cancer cells using biochemical techniques and traditional low-throughput fluorescence microscopy 31 .
We tested whether we could use HiTIPS to measure spatial clustering of centromeres at the single-nucleus level (Fig. 4).To this end, we reverse-transfected HCT116-Cas9 cells with siRNA oligos against the NCAPH2 gene, or a scrambled negative siRNA control in 384-well plates.Cells were fixed and stained with DAPI for nuclei segmentation and with a CENPC-specific antibody to visualize centromeres.Stained cells were then imaged in 3D (Fig. 4A), and maximally projected images were analyzed using HiTIPS for nuclear segmentation and spot finding/localization (Fig. 2, i-iii).HiTIPS was capable to precisely detect and localize CENPC spots in cell nuclei, even in regions of high density of CENPC spots (Fig. 4A).More importantly, the single CENPC spot position datasets and the nuclei ROIs could be used in a separate analysis to calculate a centromeric clustering score in single cells.We defined this score as the percentage of the measured curve for Ripley's K function 32 that is above the curve for the random Poisson distribution (See Materials and Methods for details).This clustering analysis showed that, in line with visual inspection, cells transfected with siNCAPH2 had higher clustering scores (Fig. 4B) and fewer distinguishable centromeres (Fig. 4C) than cells transfected with the control scrambled siRNA.These results show the utility of HiTIPS in the analysis of IF-based HTI assays at the single-cell level, and they confirm previous traditional fluorescence microscopy-based results 31 , while expanding the phenotypic analysis from a few cells to thousands of cells.

Semi-automated measurement of transcription dynamics at the single-allele level in live cells
Random or targeted intronic integration of endogenous genes with arrays of MS2 hairpins in cells stably expressing a fluorescently tagged fusion of the MS2 capsid protein (MCP) has been instrumental in demonstrating that in mammalian cells transcription happens in bursts of activity followed by periods of inactivity 33 , and that splicing of long introns is recursive 15 .HTI acquisition and analysis have been used to measure the dynamics of these events in large numbers of single live cells 14,15 .
We aimed to test whether the novel algorithms we developed in HiTIPS for spot finding, nucleus tracking, nucleus registration, and spot tracking could reproduce the results of our previous image analysis pipeline 15 and to show that HiTIPS can be applied to precisely quantify transcriptional dynamics in live cells at the single-cell level (Fig. 5).To this end, we ran the full HiTIPS analysis pipeline on HTI datasets from two clonal human  www.nature.com/scientificreports/bronchial epithelial (HBEC) cell lines, in which the ERRFI1 and KPNB1 genes have been endogenously tagged with 24xMS2 loops, enabling visualization of their nascent transcription with MCP-GFP 15 .MS2/MCP-GFP labeled transcription sites in these nuclei were detected (Fig. 2, iii), registered (Fig. 2, vi), grouped into tracks (Fig. 2, vii), and the integrated fluorescence intensity was measured at the site of transcription (Fig. 2, viii).This automated analysis resulted in the generation of 1232 fluorescence intensity traces (690 for ERRFI1-24xMS2/MCP-GFP and 542 for KPNB1-24xMS2/MCP-GFP).We further conducted a visual quality control step on the segmented transcription site traces to obtain a total of 595 traces (277 for ERRFI1-24xMS2/MCP-GFP and 318 for KPNB1-24xMS2/MCP-GFP).In agreement with previous work 15 , visualization of a subset of fluorescence intensity traces for single alleles of ERRFI1 and KPN1B1 revealed that KPNB1 bursts more frequently than ERRFI1, due to shorter periods of inactivity (OFF times) (Fig. 5A).Extending this analysis to a larger sub-sample of fluorescent intensity traces for each cell line (Fig. 5B, n = 150 for ERRFI1-24xMS2/MCP-GFP, and n = 150 for KPNB1-24xMS2/MCP-GFP) confirmed that the observed difference in transcriptional dynamics of KPNB1 and ERRFI1 extends to the cellular population.Further evidence supporting this observation comes from analysis of cumulative distributions of ON and OFF times across an even larger sub-sample of all the intensity traces (Fig. 5C , n = 260 for ERRFI1-24xMS2/MCP-GFP, and n = 260 for KPNB1-24xMS2/MCP-GFP).We observed that ERRFI1 has on average significantly longer OFF times compared to KPNB1, while the distribution of ON times is not substantially different between the two genes.These results obtained with the HiTIPS pipeline are consistent with previous observations 15 : differences in transcriptional dynamics between different genes in human cells correlate with variation in bursting frequency (OFF times), while ON times distributions for different genes remain largely invariant.Overall, these results indicate that HiTIPS can reliably measure the dynamics of transcription in an automated manner and for hundreds to thousands of live cells in long time-lapse experiments.

Discussion
We developed HiTIPS as an open-source and automated image analysis software for large HTI datasets, equipped with advanced methods for FISH and/or IF spot detection, and with novel custom algorithms for spot detection, nucleus registration, and spot registration/tracking in time-lapse datasets of live cell gene expression imaging.
Individual researchers and imaging core facilities frequently struggle with identifying HTI software that combines ease-of-use for researchers with no programming experience with the flexibility to incorporate new analysis modules and algorithms for image bioinformatics developers.In developing HiTIPS, we strived to find a balance between these two needs.We believe that HiTIPS achieves this by taking advantage of a flexible GUI for both HTI datasets visualization and analysis parameter optimization (Fig. 1).This functionality not only allows for the optimization of analysis methods and parameter adjustments but also supports the exploration of multi-well plate datasets and multi-channel visualizations.In addition, HiTIPS is written using Python, one of the most commonly used open-source programming languages in the biological image analysis and machine learning fields.The modular construction of HiTIPS allows the adoption and integration of existing state of the art image analysis algorithms (such as the Enhanced Gaussian Filter and Laplacian for spot finding) and deep learning models (such as Cellpose for nucleus segmentation).At the same time, by using a modular software architecture that allows the incorporation of not only new algorithms for existing analysis modules (e.g.nuclear segmentation, spot finding, etc.), but also of completely new HTI analysis modules.
As proof of principle, we have used HiTIPS for the analysis of HTI assays designed to address biological questions related to nuclear architecture and gene expression, in both fixed and live cells.We show that HiTIPS performs robustly for nuclear segmentation, nucleus tracking, spot detection, and spot tracking.In addition, HiTIPS can be used to measure fluorescence intensity, morphology, and kinetics measurements for nuclear compartments or markers at the single-cell level (Fig. 2).We used these HTI measurements to address a variety of questions related to 3D genome architecture (Fig. 3), centromere biology (Fig. 4), and transcriptional dynamics (Fig. 5).The results of these studies highlight HiTIPS' ability to analyze data from HTI assays incorporating various fluorescent sample preparation techniques (FISH, IF, recombinant fluorescent proteins) and in both fixed and live cells.
We consider HiTIPS a user-friendly addition to the suite of open-source software platforms available for HTI analysis.As compared to some these, HiTIPS has both advantages and limitations.ImageJ 34 has long been one of the most used and flexible open-source software for bioimage analysis and it has a rich plugin ecosystem to extend its capabilities in image visualization and analysis.As a generalist analysis tool, ImageJ can clearly handle a larger number of image analysis tasks but is not geared toward automated high-throughput image analysis using automated pipelines, which is the main design focus of HiTIPS.Cellprofiler 18,19 is another widely adopted and flexible open-source image analysis software with extensive capabilities for HTI analysis, which provides more analysis modules options, has a more flexible framework to build the image analysis pipelines, and covers a wider range of HTI analysis use cases as compared to HiTIPS.In addition, other generalist open-source image analysis software tools are designed to excel at image visualization of complex multidimensional datasets, such as Napari 35 and MoBIE 36 , or for cell segmentation and morphology assessment, such as GIANI 37 and Tonga 38 .When compared with these latter software tools, HiTIPS uniquely provides a GUI that is specifically optimized for the selection and loading of multichannel images from a multi-well plate format common in HTI experiments.In addition, HiTIPS image analysis modules provide custom spot finding algorithms that are specifically designed for sensitive detection of FISH spots in fixed cells and fluorescent mRNA spots in live cells.Furthermore, HiTIPS also uniquely provides analysis modules for high-throughput nucleus and spot registration, spot fluorescence intensity calculations, and HMM fluorescent traces segmentation, which are necessary elements for the automated analysis transcription, splicing, and chromatin dynamics at the single allele level in time lapse experiments.All these modules are not available as a single pipeline in the above-mentioned software platforms.Finally, KNIME 39 and KNIP 40 have also been used by our groups 14,15,41,42 44 are open-source software applications specialized in the analysis of FISH images, and have dedicated advanced algorithms and analysis modules for spot detection.dypFISH and RS-FISH do not have nucleus segmentation capabilities, while FISH-quant v2 is similarly aligned with HiTIPS in terms of design choices: it is an image analysis pipeline that includes a web GUI for visualization based on ImJoy 45 , it can perform nuclear segmentation using deep learning algorithms such as Cellpose, it provides advanced spot finding algorithms for smFISH images, and it calculates spot localization features that can be used to classify cells based on specific mRNA localization patterns, the latter being a feature that HiTIPS does not provide out of the box.On the other hand, HiTIPS can run a full image analysis pipeline for live cell time-lapse experiments, whereas FISH-quant v2 is limited to fixed FISH and IF experiments.In summary, each open-source software image analysis software has its own strengths and limitations based on the range of applications (e.g.generalist vs. specialized, single image vs. high-throughput, etc.), and several existing applications for bioimage analysis excel in their own application space.In this context (See Supplementary Note 3), HiTIPS is a user friendly, GUI-based HTI analysis platform that can be used to rapidly select, load, and visualize multichannel images from multi-well plate layouts, to quickly and interactively optimize image analysis parameters, and to run batch image analysis pipelines with advanced spot finding, nucleus tracking, nucleus registration, and spot tracking for both FISH and live cell experiments.
In the future, we expect that additional algorithms for FISH spot detection, such as the one present in FISH-quant v2 17 , dypFISH 43 , or RS-FISH 44 , and for nuclear segmentation and tracking, will be added to HiTIF.Furthermore, as a natural progression for HiTIPS, we also foresee the addition of modules to segment the cell body and cell membranes, to extend the range of biological questions to other cellular compartments beside the nucleus.Similarly, we envision the potential addition of analysis modules in HiTIPS to measure fluorescence texture properties, additional morphological properties, and relational/cell neighborhood properties.Given the open-source and modular software architecture of HiTIPS, we hope that the biological image analysis community will contribute to the development of these new HiTIPS features.

High-throughput DNA FISH
HCT116 RAD21-mAID-mClover (HCT116-RAD21-AID) cells 27 were grown at 37 °C in 5% CO2 in McCoy's 5A medium supplemented with 10% FBS, 2 mM L-glutamine, 100 U/ml penicillin, and 100 µg/ml streptomycin.For FISH experiments, cells were plated at a density of 8000 cells per well in 384-well imaging plates (PhenoPlate 384well, Revvity, Cat.No. 6057500) and grown overnight.The following day, the medium was replaced with either supplemented medium containing 170 mM Auxin (Sigma-Aldrich, Cat.No. I3750) to induce the degradation of RAD21, or with medium with an equivalent amount of DMSO alone as vehicle control.The cells were then incubated with or without Auxin for 3 h and fixed in 4% PFA (Electron Microscopy Sciences, Cat.No. 15710) in PBS for 10 min.After fixation, the plates were rinsed three times in PBS and stored in PBS at 4 °C.
We conducted high-throughput fluorescence in situ hybridization (hiFISH) as previously described 4,47 .BAC FISH probes were selected to hybridize to the boundary regions of the topologically associated domain (TAD) on chromosome 7 containing the EGFR gene.Fluorescently labeled BAC probes were generated by nick translation at 14 °C for 1 h and 20 min.The reaction mixture included 40 ng/ml DNA, 0.05 M Tris-HCl pH 8.0, 5 mM MgCl 2 , 0.05 mg/ml BSA, 0.05 mM dNTPs (including fluorescently tagged dUTP), 1 mM β-mercaptoethanol, 0.5 U/ml E. coli DNA Polymerase, and 0.5 mg/ml DNase I.The reaction was stopped by adding 1 µl EDTA per Vol:.(1234567890  34).The reaction was then stored at -20 °C overnight.Next, the two probes (0.5 mg per probe) were combined, precipitated with ethanol, and resuspended in 14 µl of hybridization buffer (50% formamide pH 7.0, 10% dextran sulfate, and 1% Tween-20 in 2X SSC) per well.Cells were rinsed twice with PBS and subjected to permeabilization.Permeabilization was performed at room temperature for 20 min using 0.5% w/v saponin/0.5% v/v Triton X-100 in PBS.After rinsing the cells with PBS twice, cells were deproteinated for 15 min in 0.1 N HCl and neutralized for 5 min in 2X SSC at room temperature.Cells were equilibrated overnight in 50% formamide/2X SSC at 4 °C.The probe mix was warmed to 72 °C prior to the hybridization reaction.Next, 14 µl/well of resuspended probe mix was added to the plate and denatured at 85 °C for 7 min, followed by immediate transfer to a 37 °C incubator for a 48-h hybridization period.Post-hybridization, plates were rinsed once at room temperature with 2X SSC, followed by three rinses with 1X SSC and 0.1X SSC, all warmed to 45 °C.Cells were stained with 3 µg/ml DAPI for 15 min, then rinsed and mounted in PBS and subsequently imaged on a high-throughput confocal microscope.

High-throughput live cell imaging of transcription
For the live cell transcription assay, human bronchial epithelial cell lines (HBEC3-KT) with a monoallelic insertion of an MS2 array in the intron of the model genes were used, as previously described 15 .To enable visualization of the nascent RNA, the viral MS2 capsid protein (MCP) fused to GFP and to an NLS (nuclear localization signal) tag was stably introduced into the cells using lentiviral expression vectors 48

High-throughput image acquisition
High-throughput imaging was performed using either a Yokogawa CV7000 or a CV8000 high-throughput spinning disk confocal microscopes.
In all cases, images were corrected on the fly with Yokogawa proprietary software to subtract the camera dark background, to compensate for illumination artifacts (vignetting), and for chromatic aberrations and cameras alignment.

HiTIPS implementation
HiTIPS uses the PyQt5 Python module, which offers a user-friendly GUI enabling interactive data analysis.Its architecture implements multiprocessing to optimize computational efficiency, a critical aspect when dealing with large-scale bioinformatics datasets.The parallel processing scheme in HiTIPS is designed to completely analyze (nuclei segmentation, spot detection etc.) each FOV in a separate thread.Depending on the available hardware, parallel processing in HiTIPS can reduce the analysis time by 5-to 8-fold, depending on the analysis workflow.
HiTIPS depends on several Python scientific computing libraries, including numerical computation and data manipulation (SciPy, Pandas), image processing (Pillow, Matplotlib, imageio, scikit-image, and OpenCV), dynamic cell tracking (btrack), machine learning-based image segmentation and classification (DeepCell 49 and Cellpose 50 ), image input/output and format conversion (aicsimageio, nd2reader), and Hidden Markov Model fitting (hmmlearn).At least 8 GB of RAM are required to run HiTIPS, but having 32 GB of RAM may be required for larger FOVs and for 3D volumes.In addition, when using deep learning based nuclear segmentation or cell tracking models in HiTIPS (i.e., Cellpose and DeepCell), the availability of graphical processing units (GPUs) substantially improves the inference speed of these models.Additional details about the specifications for typical hardware configurations can be found in Supplementary Note 1, while details about the implementation of the pipeline in code can be found in Supplementary Note 4.

Nucleus segmentation
Nucleus segmentation using images of nuclei stained with a fluorescent dye or a recombinant fluorescent nuclear protein is the key first step in the vast majority of HTI analysis pipelines.Given the high relevance of this step for HTI, a substantial amount of work in the field has been devoted to making nuclear segmentation algorithms fast, precise, and robust to fluctuations in cell confluency and to heterogeneity in nucleus morphology across www.nature.com/scientificreports/different cells 51 .For this reason, and to take advantage of previous advances made by other groups, we focused on integrating existing state-of-the-art nucleus segmentation algorithms into HiTIPS so that end users can easily access them and modify their parameters if needed.Accordingly, the HiTIPS GUI allows users to choose among a traditional CPU based method (Supplementary Note 2, Algorithm 1) for segmentation in the nucleus segmentation module, which we have developed to handle cases that do not necessarily require deep learning models, which require high-end graphical processing units (GPUs).In addition, HiTIPS also adopts two recent deep learning-based methods for nuclear segmentation, Cellpose 50,52 and DeepCell 53 .Deep learning-based nuclei segmentation models do not involve time consuming parameter optimization, and they generally provide excellent segmentation performance on a variety of different cell lines "out-of-the-box".On the other hand, the speed performance of segmentation models really benefits from access to GPUs, which tend to be expensive and difficult to setup for end users.Traditional image processing algorithms for nucleus segmentation can be fast if properly optimized and can handle a variety of edge cases upon expert parameter optimization.The watershed-based segmentation method is the CPU-based approach integrated into HiTIPS.It starts with image padding and noise reduction via median filtering, followed by image binarization using Li's iterative method 54 .The binary image is then processed using morphological operations and a Gaussian kernel to connect fragmented nuclei.The method labels connected components and it calculates the center of mass for each, creating a new mask image.
A watershed transform 55 is applied using this mask and the distance-transformed image to separate adjacent nuclei effectively.Finally, a boundary image is created, resized to the original size, and any holes are filled to generate the final mask.By providing an easy selection of different nuclear segmentation methods via a GUI, HiTIPS allows users to choose and optimize the method that works best on their images and in the context of the available computational hardware infrastructure.

Spot detection
HiTIPS includes morphologic, intensity, and filtering-based approaches for fluorescent spot detection.Currently, HiTIPS incorporates four different spot detection methods: Direct Thresholding, Gaussian Filter, Gaussian Laplacian, and Enhanced Gaussian Filter and Laplacian.The spot detection methods offered as part of HiTIPS have their own set of strengths and limitations, which need to be considered when choosing the appropriate spot detection method for a given type of biological sample and imaging assay.The Direct Thresholding method applies a direct thresholding technique for spot segmentation without any filtering.It is a straightforward and computationally efficient approach suitable for scenarios where spots have large contrast.However, this method may be less effective when dealing with spots that have low contrast or are close to the background intensity level.Additionally, it has limited capability to handle spots with varying intensity gradients.The Gaussian Filter method utilizes a Gaussian filter to reduce noise and enhance spots.This method performs well when spots have a relatively uniform intensity distribution and works better when spots are close together or overlapping compared to the Gaussian Laplacian method.However, it may be less effective in enhancing spots with sharp intensity variations or irregular shapes.Careful consideration of parameters such as the Gaussian filter size (sigma) and thresholding parameters is necessary.The Gaussian Laplacian method enhances spots by applying a Gaussian Laplacian filter to the input image and then segments the spots using thresholding.By utilizing the negative lobes of the Gaussian Laplacian kernel, this method not only enhances the spots but also removes the background around the spot, improving the effectiveness of automatic thresholding.It is a relatively simple and computationally efficient method.However, it may face challenges when spots are closely located or overlapping due to limited resolution.Sensitivity to parameters such as the Laplacian filter size (sigma) and thresholding parameters should be considered.The Enhanced Gaussian Filter and Laplacian method, combines the strengths of both the Gaussian Filter and Gaussian Laplacian methods.It first applies a Gaussian filter to the input image, followed by a Gaussian Laplacian filter on the filtered image, and it finally uses fluorescence thresholding for spot segmentation.This method provides enhanced capabilities for detecting spots with varying intensity gradients and can improve overall spot detection accuracy.However, achieving optimal results may require careful tuning of filter sizes (sigma), and the choice of thresholding parameters may still impact its performance.
The spot detection methods provided in HiTIPS enable the detection and localization in the X and Y dimensions of fluorescent spots generated by DNA/RNA FISH staining, or from other biological structures in maximally projected 3D z-stacks microscopy images.Subsequently, maximum intensity or Gaussian-fitted maximum intensity can be employed to estimate the spot center positions in the Z dimension of the z-stack.

Nuclei tracking
Nuclei tracking can be framed as a linear assignment problem in which N i objects in frame i are matched up with N i+1 objects in frame i + 1. Shadow objects can be introduced to account for births (i.e. from cell division events) or deaths (i.e. cells leaving the field of view).We incorporated two cell tracking methods in HiTIPS to accommodate HTI assays using cell lines with different levels of confluency and mobility.
The first method we adopted 56 revisits and updates the Kalman filtering algorithm 57 and uses a Bayesian framework to improve the cell tracking accuracy and reliability.At the onset, the algorithm constructs tracklets, which are links between consecutive cell detections that do not exhibit cell division events.These tracklets from a prior frame are paired with observed cells in the current frame to form a Bayesian belief matrix, which initially holds a uniform probability of associations.Crucially, each tracklet deploys its own Kalman filter to predict the future state of a cell, basing its predictions on motion models and information from a cell state classifier.This classifier discerns nuclear morphological variations and chromatin condensation levels, which are crucial visual features in tracking.Belief updates in the matrix consider both motion evidence (using a constant velocity model) and appearance evidence (through a cell state transition matrix).The motion aspect focuses on the estimated time dimension.The first step (Supplementary Note 2, Algorithm 4) calculates the pairwise Euclidean distance between all transcription spots in the time-projected image, and then uses single-linkage hierarchical clustering to generate an initial set of labels for the clusters.This algorithm also identifies the centroids of each unique cluster label, it calculates the standard deviation of distances from the centroid for each cluster, and it identifies outlier spots that exceed a user-defined threshold distance from the centroid of the clusters.Once the outliers are identified they are arbitrarily labeled with a label of "zero".The second algorithm (Supplementary Note 2, Algorithm 5) first determines the size of each cluster (i.e. the number of spots in the cluster) and separates them into "large" and "small" categories based on a user-defined threshold.Then, the algorithm calculates the Euclidean distances between points for each small cluster and all points in all the other large clusters.If the distance between the closest point in a large cluster is less than a user defined distance, the label of that closest point is assigned to the points in the small cluster.If not, it is labeled as an outlier.Through this process, small clusters and outliers are effectively merged into larger, more significant clusters, which streamlines the data structure and improves the interpretability of the results.The updated cluster labels, which correspond to tracks, for example of MS2/ MCP-GFP representing sites of active gene transcription, across time, are then returned.

Integrated intensity measurement
The integrated fluorescence intensity measurement of spots includes two components: Local background estimation and Gaussian mask fitting 63 .The Local Background Estimation Algorithm (Supplementary Note 2, Algorithm 6) first addresses the preprocessing of the images.This algorithm utilizes a least squares method to fit a background 2D plane using the fluorescence intensity values of the pixels at the border of an 11 × 11 pixel matrix centered around the location of the spot.The estimated background plane is then subtracted from the original image, thus locally correcting for potential non-uniform illumination, and compensating for systematic imaging noise.
Following the background correction, the Gaussian Mask Fitting Algorithm (Supplementary Note 2, Algorithm 7) fits a Gaussian mask to the image to isolate and analyze individual spots within the image.The Gaussian mask can be either statically applied based on a given centroid or it can be iteratively adjusted to improve the accuracy of the fitting.The process involves iterative computation and adjustment of the centroid coordinates of the Gaussian mask until the difference between the old and new centroids becomes negligible, or until the maximum iteration count is reached.The final output from this process is the centroid coordinates of the Gaussian mask and the estimated photon number, which can then be used for further intensity track analysis.

CENPC clustering score calculation
In our analysis, we employ a derivative of Ripley's K-function 32 , specifically designed to estimate the degree of spatial clustering at the single-cell level.This statistical measure, denoted as K(r), is defined as: In this equation, A represents the nucleus area for each cell, N is the total number of CENPC spots in the nucleus, d ij stands for the Euclidean distance between the i-th and j-th spots, and r is a predefined radius within which we evaluate the clustering.The indicator function, I() , returns 1 if (d ij < r) , and 0 otherwise.The calcula- tion of K(r) involves the summation over all unique pairs of points (i, j) in the cell (C).The resulting sum is then normalized by multiplying it to the ratio between A and the product of N and N−1.
To correct for edge effects, which can potentially bias results for spots in proximity of the nucleus ROI periphery, we employ Ripley's edge-corrected K-function.The correction to the K-function adds a weighting term for each point that is inversely proportional to the area of the region accessible to other points within the specified radius, r , without crossing the boundary of the study area.This results in a correction factor that adjusts for the reduced probability of finding neighboring points near the edges of the region under study.The K(r) calculations were performed using the Astropy package in a Jupyter notebook separate from HiTIPS.Finally, the difference between a Poisson point process (representing complete spatial randomness) and the actual data from the cell is computed.The percentage of radii where the measured value of K(r) is higher than the K(r) for the Poisson process is then calculated as a clustering score on a per cell basis.

Statistical analysis
Statistical analysis for the DNA FISH and for the CENPC clustering data was performed using the R statistical programming language, and these R packages: tidyverse, data.table,fs, and ggthemes.
Statistical analysis for the MS2-GFP live cell data was performed in Python 3.9 using these libraries: pandas, Seaborn and Matplotlib.

Figure 1 .
Figure 1.Examples of the HiTIPS GUI.(A) Representative screenshot of the HiTIPS GUI for HTI dataset selection and on-demand image loading, with metadata loading and integration in various formats, including CellVoyager and Micro-Manager.(B) Representative screenshot of the image visualization controls in the HiTIPS GUI, including, fluorescence channel toggling and z-projected views for 3D z-stacks, and fluorescence intensity visualization adjustment.(C) Representative screenshot of the overlayed display in the HiTIPS GUI of nuclei masks borders (red) and spot detection in 2 different channels (red and green circles) output from image HiTIPS analysis modules.

Figure 2 .
Figure 2. Schematic representation of the full HiTIPS image analysis workflow.(i) Image and metadata loading, leading to (ii) Nuclear segmentation and (iii) Spot finding, (iv) Nucleus tracking (v) Single nucleus timelapse generation, and (vi) Frame to Frame nuclei registration processes.(vii) Spot assignments to specific tracks are determined before (viii) Measurement of track fluorescence intensities, culminating in (ix) Segmentation of gene ON and OFF states by fitting a 2-state Hidden Markov Model (HMM) to the fluorescence intensity tracks.

Figure 3 .
Figure 3. HiTIPS measures distances between genomic loci labeled with High-Throughput DNA FISH probes.(A) Schematic representation of the experiment to measure spatial distances of two genomic loci located at the base of a TAD encompassing the EGFR gene on Chr 7 and detected by FISH probes in different channels (Probe A and Probe B).Treatment with Auxin in HCT116-RAD21-AID cells leads to rapid proteolytic degradation of the RAD21-AID fusion protein via the ubiquitin/proteasome pathway.(B) Representative 3D maximally projected images of HCT116-RAD21-AID cells stained with DNA FISH probes A and B targeted to the 5' and 3' boundaries of the EGFR TAD, respectively, and of the results of the HiTIPS spot finding algorithm results overlaid as red and blue circles, respectively.Scale bar: 5 microns.(C) Density plots of minimum spatial distances between the A and B DNA FISH probes in 1874 cells treated with Auxin or DMSO.

Figure 4 .
Figure 4. HiTIPS analysis of centromeric clustering.(A) 3D maximally projected images of HCT116-Cas9 cells reverse transfected in 384-well plates for 72 h with either a scrambled non-targeting siRNA (siScramble), or with an siRNA against the condensin II subunit NCAPH2 (siNCAPH2).Transfected cells were stained by IF with DAPI and a CENPC antibody, and imaged using a high-throughput confocal spinning disk microscope.The border of the segmented nuclei masks is overlaid on the image and colored in white.CENPC spots detected by HiTIPS are overlaid as magenta circles in the lower images.Scale bar: 10 microns.(B) Density plot of celllevel CENPC spots clustering scores for siScramble and siNCAPH2 as calculated by HiTIPS.Higher values of the clustering score indicate more clustering of CENPC spots in the nucleus.(C) Density plot of the number of CENPC spots per cell for siScramble and siNCAPH2 as calculated by HiTIPS.